Analysis of constant-Q filterbank based representations for speech emotion recognition

نویسندگان

چکیده

This work analyzes the constant-Q filterbank-based time-frequency representations for speech emotion recognition (SER). Constant-Q filterbank provides non-linear spectro-temporal representation with higher frequency resolution at low frequencies. Our investigation reveals how increased low-frequency benefits SER. The time-domain comparative analysis between short-term mel-frequency spectral coefficients (MFSCs) and features, namely transform (CQT) continuous wavelet (CWT), that provide time-invariance low-frequencies. robustness against irrelevant temporal variations in pitch, especially low-arousal emotions. corresponding frequency-domain over different classes shows better of pitch harmonics constant-Q-based than MFSC. These advantages are further consolidated by SER performance extensive evaluation features four publicly available databases six advanced deep neural network architectures as back-end classifiers. inferences this study hint toward suitability potentiality

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speech Emotion Recognition Using Scalogram Based Deep Structure

Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...

متن کامل

Learning Corpus-Invariant Discriminant Feature Representations for Speech Emotion Recognition

As a hot topic of speech signal processing, speech emotion recognition methods have been developed rapidly in recent years. Some satisfactory results have been achieved. However, it should be noted that most of these methods are trained and evaluated on the same corpus. In reality, the training data and testing data are often collected from different corpora, and the feature distributions of di...

متن کامل

Reusing Neural Speech Representations for Auditory Emotion Recognition

Acoustic emotion recognition aims to categorize the affective state of the speaker and is still a difficult task for machine learning models. The difficulties come from the scarcity of training data, general subjectivity in emotion perception resulting in low annotator agreement, and the uncertainty about which features are the most relevant and robust ones for classification. In this paper, we...

متن کامل

Improving of Feature Selection in Speech Emotion Recognition Based-on Hybrid Evolutionary Algorithms

One of the important issues in speech emotion recognizing is selecting of appropriate feature sets in order to improve the detection rate and classification accuracy. In last studies researchers tried to select the appropriate features for classification by using the selecting and reducing the space of features methods, such as the Fisher and PCA. In this research, a hybrid evolutionary algorit...

متن کامل

a gender-based pragmatic analysis of the use of english compliment responses by iraqi efl students:a speech act perspective

تعارفات کنش های گفتاری هستند که افراد در زندگی روزمر? خود به منظور برقراری دوستی یا تداوم روابط مسالمت آمیز به کار می برند. ساز و کار تعارف مختص زبان انگلیسی یا هر زبان دیگری نیست و پدیده ای است جهانی و در همه زبانها حضور دارد. تفاوتی که از این نظر در زبانها و فرهنگ ها وجود دارد مربوط به پاسخ به این کنش گفتاری در گفتمان است. این مطالعه به بررسی تنوع پاسخ های انگلیسی و عربی به کنش گفتاری تعارف د...

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Digital Signal Processing

سال: 2022

ISSN: ['1051-2004', '1095-4333']

DOI: https://doi.org/10.1016/j.dsp.2022.103712